Locally discriminative topic modeling

نویسندگان

  • Hao Wu
  • Jiajun Bu
  • Chun Chen
  • Jianke Zhu
  • Lijun Zhang
  • Haifeng Liu
  • Can Wang
  • Deng Cai
چکیده

Topic modeling is a powerful tool for discovering the underlying or hidden structure in text corpora. Typical algorithms for topic modeling include probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA). Despite their different inspirations, both approaches are instances of generative model, whereas the discriminative structure of the documents is ignored. In this paper, we propose locally discriminative topic model (LDTM), a novel topic modeling approach which considers both generative and discriminative structures of the data space. Different from PLSA and LDA in which the topic distribution of a document is dependent on all the other documents, LDTM takes a local perspective that the topic distribution of each document is strongly dependent on its neighbors. By modeling the local relationships of documents within each neighborhood via a local linear model, we learn topic distributions that vary smoothly along the geodesics of the data manifold, and can better capture the discriminative structure in the data. The experimental results on text clustering and web page categorization demonstrate the effectiveness of our proposed approach. & 2011 Elsevier Ltd. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hallucinating system outputs for discriminative language modeling

Project overview • NSF funded project and recent JHU summer workshop team • General topic: discriminative language modeling for ASR and MT – Learning language models with discriminative objectives • Specific topic: learning models from text only – Enabling use of much more training data; adaptation scenarios • Have made some progress with ASR models (topic today) – Less progress on improving MT...

متن کامل

Open Domain Short Text Conceptualization: A Generative + Descriptive Modeling Approach

Concepts embody the knowledge to facilitate our cognitive processes of learning. Mapping short texts to a large set of open domain concepts has gained many successful applications. In this paper, we unify the existing conceptualization methods from a Bayesian perspective, and discuss the three modeling approaches: descriptive, generative, and discriminative models. Motivated by the discussion o...

متن کامل

Markovian Discriminative Modeling for Dialog State Tracking

Discriminative dialog state tracking has become a hot topic in dialog research community recently. Compared to generative approach, it has the advantage of being able to handle arbitrary dependent features, which is very appealing. In this paper, we present our approach to the DSTC2 challenge. We propose to use discriminative Markovian models as a natural enhancement to the stationary discrimin...

متن کامل

Discriminative Bi-Term Topic Model for Headline-Based Social News Clustering

Social news are becoming increasingly popular. News organizations and popular journalists are starting to use social media more and more heavily for broadcasting news. The major challenge in social news clustering lies in the fact that textual content is only a headline, which is much shorter than the fulltext. Previous works showed that the bi-term topic model (BTM) is effective in modeling sh...

متن کامل

Discriminative Random Field Modeling of Lung Tumors in CT Scans

The ability to conduct high-quality automatic 3D segmentation of tumors in CT scans is of high value to busy radiologists. Discriminative random fields (DRFs) were used to segment 3D volumes of lung tumors in CT scan data. Optimal parameters for the DRF inference were first calculated using gradient ascent. These parameters were then used to solve the inference problem using the graph cuts algo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pattern Recognition

دوره 45  شماره 

صفحات  -

تاریخ انتشار 2012